ASR - articulatory speech recognition

نویسندگان

  • Joe Frankel
  • Simon King
چکیده

The hidden Markov model (HMM) has proven to be the model which has made large-vocabulary automatic speech recognition (ASR) possible. The HMM is robust, versatile and has at its disposal a host of efficient algorithms which deal with training, speaker adaptation and recognition. However, there is nothing uniquely speech orientated about the HMM. In fact, certain assumptions are made of speech which are known to be untrue. For example, speech is modelled as a piecewise stationary process when we know it to be continuous. Also, co-articulation, which should be a rich source of information, simply provides unwanted variation. This variation is generally taken into account by modelling every phone in every context which in turn leads to problems of data sparcity, making elaborate parameter tying schemes necessary. Speech is generally modelled in a parametrised version of the acoustic domain, which is natural given that this is the data we have most ready access to. Any practical speech recogniser must of course take acoustic waveforms as input, however to take these in isolation from the production mechanism which created them ignores a rich source of prior knowledge. We propose that modelling speech in the articulatory domain using linear dynamic models (see section 4) will address some of these issues. The data here consists of trajectories which evolve smoothly over time, namely coordinates of points on the articulators. Effects such as coarticulation and assimilation are most simply described in articulatory terms, as opposed to in acoustic terms where they are confounded with the representation. Models that work in the articulatory domain are therefore able to explicitly model these phenomena. We have access to real articulatory data, collected by Alan Wrench at Queen Margaret College, Edinburgh (see [1] for further details). This has been used to train neural networks to recover articulatory traces from the acoustics. In our experiments we have used both real and automatically recovered articulation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

iii IMPROVING ROBUSTNESS OF SPEECH RECOGNITION SYSTEMS

Title of Dissertation: ARTICULATORY INFORMATION FOR ROBUST SPEECH RECOGNITION. Vikramjit Mitra, Doctor of Philosophy, 2010 Dissertation directed by: Dr. Carol Y. Espy-Wilson Department of Electrical and Computer Engineering Current Automatic Speech Recognition (ASR) systems fail to perform nearly as good as human speech recognition performance due to their lack of robustness against speech vari...

متن کامل

Speech recognition with phonological features: some issues to attend

It is often argued that acoustic-phonetic or articulatory features could be beneficial to automatic speech recognition because they provide a convenient interface between the acoustic and the linguistic level. Former research has shown that a combination of acoustic and articulatory information can lead to improved ASR. However there exists no purely articulatory driven ASR system that outperfo...

متن کامل

Speech Recognition for the iCub Platform

This paper describes open source software (available at https://github.com/robotology/ natural-speech) to build automatic speech recognition (ASR) systems and run them within the YARP platform. The toolkit is designed (i) to allow non-ASR experts to easily create their own ASR system and run it on iCub and (ii) to build deep learning-based models specifically addressing the main challenges an A...

متن کامل

Integrating Articulatory Features into Acoustic Models for Speech Recognition

It is often assumed that acoustic-phonetic or articulatory features can be beneficial for automatic speech recognition (ASR), e.g. because of their supposedly greater noise robustness or because they provide a more convenient interface to higher-level components of ASR systems such as pronunciation modeling. However, the success of these features when used as an alternative to standard acoustic...

متن کامل

Production Knowledge in the Recognition of Dysarthric Speech

Production knowledge in the recognition of dysarthric speech Frank Rudzicz Doctor of Philosophy Graduate Department of Department of Computer Science University of Toronto 2011 Millions of individuals have acquired or have been born with neuro-motor conditions that limit the control of their muscles, including those that manipulate the articulators of the vocal tract. These conditions, collecti...

متن کامل

Automatic Speech Recognition on Vibrocervigraphic and Electromyographic Signals

Automatic speech recognition (ASR) is a computerized speech-to-text process, in which speech is usually recorded with acoustical microphones by capturing air pressure changes. This kind of air-transmitted speech signal is prone to two kinds of problems related to noise robustness and applicability. The former means the mixing of speech signal and ambient noise usually deteriorates ASR performan...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001